Sketching and Clustering Metric Measure Spaces

نویسندگان

  • Facundo Mémoli
  • Anastasios Sidiropoulos
  • Kritika Singhal
چکیده

Two important optimization problems in the analysis of geometric data sets are clustering and sketching. Here, clustering refers to the problem of partitioning some input metric measure space into k clusters, minimizing some objective function f . Sketching, on the other hand, is the problem of approximating some metric measure space by a smaller one supported on a set of k points. Specifically, we define the k-sketch of some metric measure space M to be the nearest neighbor of M in the set of k-point metric measure spaces, under some distance function ρ on the set of metric measure spaces. In this paper we demonstrate a duality between general classes of clustering and sketching problems. We present a general method for efficiently transforming a solution for a clustering problem to a solution for a sketching problem, and vice versa, with approximately equal cost. More specifically, we obtain the following results. We define the sketching/clustering gap to be the supremum over all metric measure spaces of the ratio of the sketching and clustering objectives. 1. For metric spaces, we consider the case where f is the maximum cluster diameter, and ρ is the Gromov-Hausdorff distance. We show that the gap is constant for any compact metric space. 2. We extend the above results to obtain constant gaps for the case of metric measure spaces, where ρ is the p-Gromov-Wasserstein distance and the clustering objective involves minimizing various notions of the `p-diameters of the clusters. 3. We consider two competing notions of sketching for metric measure spaces, with one of them being more demanding than the other. These notions arise from two different definitions of p-Gromov-Wasserstein distance that have appeared in the literature. We then prove that whereas the gap between these can be arbitrarily large, in the case of doubling metric spaces the resulting sketching objectives are polynomially related. ∗Department of Mathematics, The Ohio State University, [email protected],edu. †Department of Computer Science, University of Illinois at Chicago, [email protected]. ‡Department of Mathematics, The Ohio State University, [email protected]. ar X iv :1 80 1. 00 55 1v 1 [ cs .C G ] 2 J an 2 01 8

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Impossibility of Sketching of the 3D Transportation Metric with Quadratic Cost

Transportation cost metrics, also known as the Wasserstein distances Wp, are a natural choice for defining distances between two pointsets, or distributions, and have been applied in numerous fields. From the computational perspective, there has been an intensive research effort for understanding the Wp metrics over R, with work on the W1 metric (a.k.a earth mover distance) being most successfu...

متن کامل

Internal quality measures for clustering in metric spaces

This paper reviews clustering inmetric spaces and someof themany and various fitness measures used to measure cluster quality. Experiments are undertaken to determine the correlation between these measures.

متن کامل

Metric Structures on Datasets: Stability and Classification of Algorithms

Several methods in data and shape analysis can be regarded as transformations between metric spaces. Examples are hierarchical clustering methods, the higher order constructions of computational persistent topology, and several computational techniques that operate within the context of data/shape matching under invariances. Metric geometry, and in particular different variants of the GromovHau...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.00551  شماره 

صفحات  -

تاریخ انتشار 2018